Skip to content

Conversation

@PSeitz
Copy link
Contributor

@PSeitz PSeitz commented Oct 31, 2025

In lz4_flex a fixed size parameter is used to copy_within to avoid calls to libc memmove.
However, I can see in the call stack still see calls to libc. I think it's because of the missing inline.

It's using the equivalent of this:
https://godbolt.org/z/cKqYYvKbT
(in this example copy_within is getting inlined)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Oct 31, 2025
@rustbot
Copy link
Collaborator

rustbot commented Oct 31, 2025

r? @joboet

rustbot has assigned @joboet.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@joboet
Copy link
Member

joboet commented Nov 1, 2025

Does the assembly contain calls to copy_within or to memmove? #[inline] will only help in the first case, not the second. In general, if LLVM decides to use memmove, that's mostly the right call – at least on GNU/Linux, memmove takes advantage of the SIMD features of the current CPU, which can end up being much faster than an unrolled implementation that can only use the CPU features of the target.

@PSeitz
Copy link
Contributor Author

PSeitz commented Nov 1, 2025

Does the assembly contain calls to copy_within or to memmove? #[inline] will only help in the first case, not the second. In general, if LLVM decides to use memmove, that's mostly the right call – at least on GNU/Linux, memmove takes advantage of the SIMD features of the current CPU, which can end up being much faster than an unrolled implementation that can only use the CPU features of the target.

It contains calls to memmove. Inlining will also help in this case, because without inlining llvm looses the fixed size information.
LLVM will replace calls to memmove with custom assembly in trivial cases, since e.g. a 8 byte copy is already faster than the libc function call.

Here's an example for both cases, the fixed 18bytes version does not contain a call to memmove:
https://godbolt.org/z/1xz7Gevbd

@Noratrieb
Copy link
Member

If your assembly does not include calls to copy_within, then this seems unlikely to help. It's also not that important because as a generic function, it's already eligible for inlining, so the attribute merely slightly adjusts heuristics. Have you checked that this PR helps?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants